How to use bindings and loops
Bindings and loops can be used to generalize and compress long scripts with repeating structure. E.g., if you want to run some identical sets of operations multiple times, this can be generalized by creating a loop. Bindings are created through the let
command while loops are created with the for
command.
textblock
# Introduction to new functionality: Bindings and Loops
endblock
require no.ssb.fdb:26 as f
create-dataset ds
textblock
# Bindings
Bindings are local values that are stored in the client with `let`. The values can either be used locally, or inserted into expressions before they are sent for analysis.
In practice, bindings create global variables that can be referred to across datasets, and can be useful to generalize a script.
For example, you can make a script time-independent by creating a let-binding that contains a year. Instead of specifying the specific year everywhere in a script, the binding is used instead. If you later want to run the script on a new vintage, you just need to change the value of the binding (and you don't have to go through the entire script to change the year).
Use `help let` for the full syntax description.
You can bind both to strings and numbers:
endblock
let string = "foo"
let twothousand = 2000
textblock
Mathematical expressions also work in bindings
endblock
let approx_pi = 22/7
textblock
Functions that operate only on bindings, called `procedures` can also be used. Use `help-procedure` to see the list of these
endblock
let pi = pi()
let one = cos(2 * pi())
textblock
The new operator, `++` concatenates strings together with each other, or with numbers. Note that number + string = string.
endblock
let date = 2000 ++ "-01-01"
textblock
To refer to an existing binding, a `$` sign is placed in front:
endblock
let twothousand_value = $twothousand
let startdate = $twothousand ++ "-01-01"
textblock
You can also set a binding to a symbol. This is a specific object that you will later refer to, usually a variable.
It does not necessarily have to be an existing variable, but can be inserted into an expression where the variable is to be generated.
Symbols should not be specified with quotes like strings.
Example:
endblock
let siv = sivstate
textblock
Symbols can also be used in expressions by concatenation
endblock
let twothousand_symbol = two ++ thousand
textblock
Symbols in bindings can therefore be used like this. This imports the variable to the symbol pointed to by the binding `$siv`, which is `sivstate`
endblock
import f/SIVSTANDFDT_SIVSTAND $startdate as $siv
textblock
Note above that bindings given to the date variable in import commands are handled specially.
They are stored as a string and are converted to a date value when used here.
NB: Bindings can be clicked on in the interface to bring up a window that shows the origin of the binding. Try with `$startdate` in the result window or in the dataset details. If the binding is derived from another binding, like `$twothousand`, this can further be clicked on to see its origin.
endblock
textblock
# Loops
Loops are a way to run a command multiple times, where the value in one or more given bindings changes for each iteration. In the loop head, you set an iterator that will behave just like the let-bindings.
The difference is that you can define several values that will be used one after the other.
Use `help for` for the full syntax description
endblock
for iterator in 1 2 3
let double = $iterator * 2
end
textblock
Until the loop is closed, this is equivalent to `let iterator = 1`.
When `end` is run, all the commands inside the loop are executed for the remaining iterators 2 and 3.
This allows you to run the first iteration of the loop and see the results before running the remaining iterations.
This can be useful when you are developing a script and want to check that you are getting the correct/expected results from costly operations.
Try this by either setting a breakpoint on the command inside the loop (clicking on the margin or `alt+enter` when the line is selected), or by executing the steps one by one in the command line.
endblock
textblock
You can define the iterator values in two ways, either as a list of numbers, strings and/or symbols (as above), or by a __numeric__ range of values `from : to` (inclusive)
endblock
for iterator in 1:3
let double = $iterator * 2
end
textblock
The bindings created inside the loop are local to the iteration and disappear afterwards.
Variables created in the current dataset will exist after the loop.
This is to be able to import variables like this:
endblock
for year in 2000 : 2002
let siv_date = $year ++ "-01-01"
let siv_year = siv_ ++ $year
import f/SIVSTANDFDT_SIVSTAND $siv_date as $siv_year
end
textblock
The pattern of importing variables like this is common, and we see two things:
Specifying a date with specified dates and months can be cumbersome to do programmatically.
We therefore offer the `date_fmt` procedure to make this easier.
endblock
let dt1 = date_fmt(2000)
let dt2 = date_fmt(2000, 10)
let dt3 = date_fmt(2000, 10, 20)
textblock
There can be many temporary bindings in loops. In the loop example above, `siv_date` and `siv_year` are also used only once. To avoid creating such temporary let-bindings, you can also use binding expressions inside the command itself. There are different ways how this can be done:
1) In expressions where a new variable is generated, this can be specified with a binding expression directly. This applies to `import`, `generate` and more. Look for `name` in the command parameters in the help text.
So this import statement corresponds to the one we defined in the loop above:
endblock
let year = 2000
import f/SIVSTANDFDT_SIVSTAND 2000-01-01 as siv_inline ++ $year
textblock
2) The date expression in an import statement is treated specially so that you can use either a date (as usual), a binding (as shown above), or a procedure directly. This primarily facilitates the use of the `date_fmt` procedure to specify a date directly. the import statement above can then be simplified to
endblock
import f/SIVSTANDFDT_SIVSTAND date_fmt(2000) as siv_proc ++ $year
textblock
3) In general expressions, such as in `generate` or after `if`, bindings cannot be used freely as in `name`.
This is because it is not possible to distinguish between what should apply to the binding and what applies to the expression itself.
The operator `++` is, however, special since it is only defined for bindings. `++` will therefore be evaluated even if we are inside another expression.
endblock
generate married = 1 if siv_ ++ $year + 2 == 1
textblock
Note that `++` has lower precedence than `+` and `-`. This means that
`siv_ ++ $year + 2 == 1`
is the same as
`(siv_ ++ ($year + 2)) == 1`.
Be aware of this if you actually want to use the variable in a mathematical expression.
If you e.g. want to add 1 to all variables named `siv_{$year}` this must be done like this:
`(siv_ ++ $year) + 1`
endblock
textblock
The use of these two techniques then gives us the shorter loop variant:
endblock
for year in 2000 : 2002
import f/SIVSTANDFDT_SIVSTAND date_fmt($year) as inline_siv ++ $year
end
textblock
It is wise to think carefully when using inline bindings, as the code can become harder to read with excessive use.
When writing these in the command line, the error messages will also not be as good since it is harder to match the evaluated expression with where it should be inserted.
Named bindings with `let` are also useful for indicating the intention with the value directly in the program, especially if the value is tied to a constant value that comes from outside the system.
endblock
textblock
## Example
A typical example of using bindings and loops where a set of variables measured over several years is imported, for a random selection of resident individuals in a given age group:
endblock
let start_year = 2020
let start_date = date_fmt($start_year + 1)
let minage = 40
let maxage = 50
create-dataset totalpop
import f/BEFOLKNING_FOEDSELS_AAR_MND as birthdate
sample 0.1 12345
generate age = $start_year - int(birthdate/100)
import f/BEFOLKNING_STATUSKODE $start_date as regstat
keep if regstat == '1' & age >= $minage & age <= $maxage
histogram age, discrete freq
//Alternative 1
for i in 2016 : 2020
let idate = date_fmt($i, 12, 31)
let yy = $i - 2000
let var = wage ++ $yy
import f/INNTEKT_LONN $idate as $var
end
//Alternative 2
for i in 2016 : 2020
import f/INNTEKT_LONN date_fmt($i, 12, 31) as wage ++ $i - 2000 ++ "_2"
end
textblock
Currently, it is not possible to create bindings consisting of lists of symbols (variables), e.g. `let vars = gender age education`. This is under development
textblock
## Advanced loops
Loops can also iterate over several values, nest over several values, or both at the same time. This is achieved with a generator syntax.
Iterate over several values: If, for example, you want to import a bunch of dates and give them names that cannot be derived from the year:
endblock
for year, sivname in 2000 : 2002, first second third
import f/SIVSTANDFDT_SIVSTAND date_fmt($year) as $sivname
end
textblock
Nested loops: If you want to do this several times to generate similar variables with other names. This is the same as nesting the loop after `;` inside the loop in front.
This will then iterate over the values 2000 and first, 2000 and second, 2001 and first, etc.:
endblock
for year in 2000 : 2002; sivname in first second
import f/SIVSTANDFDT_SIVSTAND date_fmt($year) as $sivname ++ $year ++ nested
end
textblock
These two techniques can be combined to iterate over several values in a nested loop:
endblock
for year, color in 2000 : 2002, blue yellow green; sivname in first second
import f/SIVSTANDFDT_SIVSTAND date_fmt($year) as $sivname ++ $year ++ $color
end